Motion planning is challenging for autonomous systems in multi-obstacle environments due to nonconvex collision avoidance constraints. Directly applying numerical solvers to these nonconvex formulations fails to exploit the constraint structures, resulting in excessive computation time. In this paper, we present an accelerated collision-free motion planner, namely regularized dual alternating direction method of multipliers (RDADMM or RDA for short), for the model predictive control (MPC) based motion planning problem. The proposed RDA addresses nonconvex motion planning via solving a smooth biconvex reformulation via duality and allows the collision avoidance constraints to be computed in parallel for each obstacle to reduce computation time significantly. We validate the performance of the RDA planner through path-tracking experiments with car-like robots in simulation and real world setting. Experimental results show that the proposed methods can generate smooth collision-free trajectories with less computation time compared with other benchmarks and perform robustly in cluttered environments.
translated by 谷歌翻译
分布式概括(OOD)都是关于对环境变化的学习不变性。如果每个类中的上下文分布均匀分布,则OOD将是微不足道的,因为由于基本原则,可以轻松地删除上下文:类是上下文不变的。但是,收集这种平衡的数据集是不切实际的。学习不平衡的数据使模型偏见对上下文,从而伤害了OOD。因此,OOD的关键是上下文平衡。我们认为,在先前工作中广泛采用的假设,可以直接从偏见的类预测中注释或估算上下文偏差,从而使上下文不完整甚至不正确。相比之下,我们指出了上述原则的另一面:上下文对于类也不变,这激励我们将类(已经被标记为已标记的)视为不同环境以解决上下文偏见(没有上下文标签)。我们通过最大程度地减少阶级样本相似性的对比损失,同时确保这种相似性在所有类别中不变,从而实现这一想法。在具有各种上下文偏见和域间隙的基准测试中,我们表明,配备了我们上下文估计的简单基于重新加权的分类器实现了最新的性能。我们在https://github.com/simpleshinobu/irmcon上提供了附录中的理论理由和代码。
translated by 谷歌翻译
我们有兴趣从数据不足的情况下学习强大的模型,而无需任何外部预训练的检查点。首先,与足够的数据相比,我们展示了为什么数据不足会使模型更容易偏向于通常不同于测试的有限培训环境。例如,如果所有训练天鹅样本都是“白色”,则该模型可能错误地使用“白色”环境来代表内在的天鹅。然后,我们证明,均衡感应偏差可以保留类功能,而不变性电感偏差可以消除环境功能,从而使类功能概括为测试中的任何环境变化。为了将它们强加于学习,我们证明可以部署任何基于现成的基于对比度的自我监督特征学习方法;对于不变性,我们提出了一个范围的不变风险最小化(IRM),该风险最小化(IRM)有效地应对常规IRM中缺少环境注释的挑战。对现实世界基准(Vipriors,Imagenet100和Nico)的最新实验结果验证了在数据效率学习中的巨大潜力和不变性的潜力。该代码可从https://github.com/wangt-cn/eqinv获得
translated by 谷歌翻译
我们专注于视觉接地管道语言与位置之间的混淆偏见,在那里我们发现偏差是主要的视觉推理瓶颈。例如,接地过程通常是一种琐碎的语言 - 位置关联,没有视觉推理,例如,将任何包含绵羊的语言查询接地到近中心区域,由于绵羊在图像中心的地面真实位置存在地面真相位置。首先,我们将视觉接地管道框架框成了因果图,其显示图像,查询,目标位置和底层混淆之间的因果关系。通过因果图,我们知道如何打破接地瓶颈:Deconfounded视觉接地。其次,为了解决混乱的挑战,即一般而言,我们提出了一种呼吁呼吁:引用表达式解构器(红色),以消除混淆偏差。第三,我们实施红色作为一种简单的语言关注,可以以任何接地方法应用。在流行的基准测试中,红色通过显着的边缘改善了各种最先进的接地方法。代码将很快提供:https://github.com/jianqiangh/deconfounded_vg。
translated by 谷歌翻译
向量图形文档呈现多个视觉元素,例如图像,形状和文本。对于业余爱好者和专业设计师来说,为多个视觉元素选择合适的颜色是一项艰巨但至关重要的任务。我们没有为所有元素创建单个调色板,而是从图形文档中的每个视觉元素中提取多个调色板,然后将它们组合成颜色序列。我们为颜色序列完成提出了一个掩盖的颜色模型,并建议基于多板的颜色上下文的指定颜色,概率很高。我们训练模型并在矢量图形文档的大规模数据集上构建颜色建议系统。提出的颜色建议方法通过定量和定性评估对颜色预测和我们的颜色推荐系统的表现优于其他最先进的方法,并在访谈研究中收到了专业设计师的积极反馈。
translated by 谷歌翻译
面部聚类是使用大型未标记的面部图像扩展面部识别系统的一种有希望的方法。识别我们称之为硬群的小或稀疏的面部图像簇仍然具有挑战性,这是由簇的异质性,\ ie,大小和稀疏性的高变化引起的。因此,使用均匀阈值(识别簇)的常规方式通常会导致对应该属于硬群的样品的可怕分类。我们通过利用样品的邻居信息并以概率方式推断(样本的)群集成员来解决这个问题。我们介绍了两个新型模块,分别是基于邻域扩散的密度(NDDE)和基于过渡概率的距离(TPDI),我们可以简单地将标准密度峰值聚类算法应用于均匀的阈值。我们对多个基准测试的实验表明,每个模块都会有助于我们方法的最终性能,并通过将其纳入其他高级面部聚类方法中,这两个模块可以将这些方法的性能提高到新的最先进。代码可在以下网址获得:https://github.com/echoanran/on-mitigating-hard-clusters。
translated by 谷歌翻译
互联网是人类有史以来最复杂的机器,以及如何防御入侵的情况更加复杂。随着新入侵的不断增加,入侵检测任务越来越依赖人工智能。机器学习模型的可解释性和透明度是对AI驱动的入侵检测结果的信任的基础。当前的解释人工智能技术是启发式的,这既不准确也不足够。本文提出了一种基于人造免疫系统的严格可解释的人工智能驱动的入侵检测方法。介绍了决策树模型的严格解释计算过程的细节。对良性交通流的主要隐含解释是针对网络免疫系统负选择的规则。实验是在现实生活中进行的。
translated by 谷歌翻译
Meta-learning has been proposed as a framework to address the challenging few-shot learning setting. The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, meta-learning typically uses shallow neural networks (SNNs), thus limiting its effectiveness. In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. Specifically, meta refers to training multiple tasks, and transfer is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta-batch scheme as an effective learning curriculum for MTL. We conduct experiments using (5-class, 1-shot) and (5-class, 5shot) recognition tasks on two challenging few-shot learning benchmarks: miniImageNet and Fewshot-CIFAR100. Extensive comparisons to related works validate that our meta-transfer learning approach trained with the proposed HT meta-batch scheme achieves top performance. An ablation study also shows that both components contribute to fast convergence and high accuracy 1 .Optimize θ by Eq. 3; 5 end 6 Optimize Φ S {1,2} and θ by Eq. 4 and Eq. 5; 7 while not done do 8 Sample class-k in T (te) ; 9 Compute Acc k for T (te) ; 10 end 11 Return class-m with the lowest accuracy Acc m .
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译